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Abstract 



A typical computational geometry problem begins: Consider a set P of n points in M'*. However, many 
applications today work with input that is not precisely known, for example when the data is sensed and has some 
known error model. What if we do not know the set P exactly, but rather we have a probability distribution /Xp 
governing the location of each point p € PI 

Consider a set of (non-fixed) points P, and let /xp be the probability distribution of this set. We study several 
measures (e.g. the radius of the smallest enclosing ball, or the area of the smallest enclosing box) with respect 
to /ip. The solutions to these problems do not, as in the traditional case, consist of a single answer, but rather a 
distribution of answers. We hence describe a data structure, called an ^-quantization, that can approximate such a 
distribution within emO(l/e) space. We also extend this data structure to answer higher dimensional queries of 
/ip (e.g. the length and width of the smallest enclosing box in M^). 

Rather than compute a new data structure for each measure we are interested in, we can also compute a single 
data structure that allows us to answer many questions at once. This data structure, an (e, a) -kernel, is based on 
a-kernel coresets and can be used to create approximate e-quantizations for geometric problems involving extent 
measures. 

Thirdly, we introduce a data structure that can answer questions of the type 'what is the probability that point 
q is in the smallest enclosing ball of PT For a given distribution /ip and summarizing shape (e.g. the smallest 
enclosing ball), we define an e-shape inclusion probability function to be a function that assigns to a query point 
q & value that is at most e away from the probability that q is contained in this summarizing shape of P. 
This results in a probability description more directly linked to the space that the input points live in. 

We provide simple and efficient randomized algorithms for computing all of these data structures, which are 
easy to implement and practical. We provide some experimental results to assert this. We also provide more 
involved deterministic algorithms for e-quantizations for problems involving shapes with boimded VC-dimension 
that run in time polynomial in n and 1/e. 



1 Introduction 



The input for a typical computational geometry problem is a set P of n points in M^, or more generally M*^. Tradi- 
tionally, such a set of points is assumed to be known exactly, and indeed, in the 1980s and 1990s such an assumption 
was often justified because much of the input data was hand-constructed for computer graphics or simulations. How- 
ever, in many modern applications the input is sensed from the real world, and such data is inherently imprecise. 
Therefore, there is a growing need for methods that are able to deal with imprecision. 

An early model to quantify imprecision in geometric data, motivated by finite precision of coordinates, is e- 
geometry, introduced by Guibas et al. [10]. In this model, the input is given by a traditional point set P, where the 
imprecision is modeled by a single extra parameter e. The true point set is not known, but it is certain that for each 
point in P there is a point in the disk of radius e around it. This model has proven fruitful and is still often used due 
to its simplicity. To name a few, Guibas et al. [11] define strongly convex polygons: polygons that are guaranteed 
to stay convex, even when the vertices are perturbed by e. Bandyopadhyay and Snoeyink [3] compute the set of all 
potential simplices in and that could belong to the Delaunay triangulation. Held and Mitchell [13] and Loffler 
and Snoeyink [15] study the problem of preprocessing a set of imprecise points under this model, so that when the 
true points are specified later some computation can be done faster. 

A more involved model for imprecision can be obtained by not specifying a single e for all the points, but allowing 
a different radius for each point, or even other shapes of imprecision regions. This allows for modeling imprecision 
that comes from different sources, independent imprecision in different dimensions of the input, etc. This extra 
freedom in modeling comes at the price of more involved algorithmic solutions, but still many results are available. 
Nagai and Tokura [19] compute the union and intersection of all possible convex hulls to obtain bounds on any 
possible solution, as does Ostrovsky-Berman and Joskowicz [20] in a setting allowing some dependence between 
points. Van Kreveld and Loffler [23] study the problem of computing the smallest and largest possible values of 
several geometric extent measures, such as the diameter or the radius of the smallest enclosing ball, where the points 
are restricted to lie in given regions in the plane. Kruger [14] extends some of these results to higher dimensions. 

However, some applications dealing with sensed data provide more information about the imprecision than just 
a region, and a probabihty distribution governing the expected location of each point may be available. In robotic 
mapping [8] careful error models are used to govern the laser range finder data. In data mining [2] original data is 
often perturbed by a known model for privacy preserving purposes. In databases [6] large data sets may be summa- 
rized as probability distributions to store them more compactly. The atoms of a protein structure have probabilistic 
distributions as determined by NMR spectroscopy reconstruction algorithms [22], rotamers, or other variability. 
Similarly, probabihty distribution models are produced for GIS data, data from sensor networks, astrological data, 
and many other sources. In these cases, the above threshold error models could be adapted to this data by choosing 
an error distance beyond which the probability is below a certain threshold. However, the solutions produced under 
the threshold error models depend heavily on the boundary cases of the error model, while it is reasonable to expect 
the points are more likely to appear near the "center" of the regions. Working directly with probability distributions 
can provide more accurate answers to geometric questions about such sets of points. 

This paper studies the computation of extent measures on uncertain point sets governed by probability distribu- 
tions. Unsurprisingly, directly using the probability distribution error model creates harder algorithmic problems, 
and many questions may be impossible to answer exactly under this model. But since the data is imprecise to begin 
with, it is also reasonable to construct approximate answers. Our algorithms have approximation guarantees with 
respect to the original distributions, not an approximation of them. Instead of reinventing computational geometry 
for probability distributions, this paper reduces problems on data governed by probability distributions to discrete 
and well-studied computational geometry problems on precise point sets. 

1.1 Problem Statement 

Let jip : ^ describe the probability distribution of a point p where /^.g^d Mpl^^) dx = 1. Let jip : 
X M*^ X . . . X M*^ — > describe the distribution of a point set P by the joint probability over each p E P. For 
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simplicity we refer to the space R*^ x . . . x M'^ as M*^" when it is a product of n d-dimensional spaces. For this paper 
we will assume ^ip{qi,q2, ■ ■ ■ ,qn) = TVi=i f^Pi ili)' so the distribution for each point is independent, although for 
our randomized algorithms this restriction can be easily circumvented. 

Given a distribution //p we can ask questions traditionally asked of point sets that are given precisely, instead of 
as distributions (e.g. the diameter or the axis-aligned bounding box). In the presence of imprecision, the answer to 
such a question if not a single value or structure, but also a distribution of answers. The point of this paper is not 
just how to answer geometric questions about these distributions, but how to concisely represent them. 



£-Quantizations. Let / : W^"' Whe a single-valued function on a fixed point set, such as the radius of the 
minimum enclosing ball. For a query value v, 

ffM= [ l{f{Q)<v)-fip{Q)dQ, 

where Q is taken over all size n point sets in R'^ and !(•) is the indicator function, is the probability that / will 
yield a value less than or equal to v, given the distribution np. Then /^^ is the cumulative density function of the 
distribution of possible values that / can take. Ideally, we would return the function /^^ so we could quickly answer 
any query exactly, however, it is not clear how to compute closed forms for such functions for one specific value, 
let alone all values. Rather, we introduce a data structure, which we call an e-quantization, to answer such queries 
approximately and efficiently. For an isotonic function /^^ and any value v, an e-quantization, R, guarantees that 
l-'^(^) ~ ^ ^- Furthermore, the size of an e-quantization is always dependent only on e, not on \P\ or /xp. 

Sometimes a statistic for a point set has multiple values, such as the width of the minimum enclosing axis-aligned 
rectangle along the x-axis and the y-axis. For a function / : M"*" — > M'^ let 

ft^vi,...,vk)= [ l{f{Q)^{vi,...,Vk))-fi{Q)dQ, 

where for a point p e M!' the operation p ^ (^^i, . . • ,Vk) determines whether pi < Vi for each i, where pi is 

the ith coordinate of p. Note that f^^ must be isotonic in the sense that for two points p,q £ MJ^ if p < q then 
f^p{p) ^ fiTpil)- ^ k-variate e-quantization R for an isotonic function f^^ : R'^ — > [0, 1] and for a query u G M*^ 
guarantees \R{v) — /^^^(f )| < £■ The size of a multivariate e-quantization is dependent only on e and k. 



(e, a)-Kernels. Rather than compute a new data structure for each measure we are interested in, we can also 

compute a single data structure that allows us to answer many questions at once. For an isotonic function f^^ : 
M"*" [0, 1] an (e, a) -quantization M guarantees that there exists a point x' such that (1) \x — x'\ < ax and (2) 
\M{x) — f^p{x')\ < e. An (e, a)-kemel is a data structure that can produce an (e, a) -quantization for /^^ where / 
measures the width in any directions and whose size depends only on ^ and ^ . 

Shape Inclusion Probabilities. To summarize a point set P C W^, we often approximate it with a shape, such 

as the smallest enclosing ball. For /c-variate e-quantizations with large k, it can be hard to visualize the connection 
to the ambient ci-dimensional space of the data points (i.e. for smallest enclosing ball we could use a (ci + l)-variate 
e-quantization to measure the d coordinates of the center point and the radius). Instead, for a summarizing shape 
we may wish to study a shape inclusion probability function h^p : M*^ — ^ [0, 1] (or sip function) which describes 
the probability that a given point x G is included in the summarizing shape ^. Again, there does not seem to 
be a closed form for many of these functions. Rather we calculate an e-sip function h : ^ [0, 1] such that 



V, 



h{x) — h{x) 



< £. The size of an e-sip depends only on e and the complexity of the summarizing shape. 



*For technical reasons, if there are (degenerately) multiple optimal summarizing shapes, we say each are equally likely to be the summa- 
rizing shape of the point set. 
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1.2 Our Results 

We describe simple and practical randomized algorithms for computing e-quantizations, e-sip functions, and (e, a)- 
kemels. Let Tf{n) be the time it takes to calculate a summarizing shape of a set of n points Q C M'^, which generates 
a statistic f{Q). We can calculate an e-quantization of /^j^, with probability 1 — S, in time 0(Tj(n)^ log 
For univariate e-quantizations the size is O(^), and for fe-variate e-quantizations the size is 0(A:^^ log^'^ ^). With 
probability 1—S, we can calculate an e-sip function of size 0{-^ log ^) in time 0(Ty(n)^ log With probability 
1 — (5, we can calculate an (e, Q)-kernel of size 0( ^(rfii)/2 ^ log ^) in time 0((n + ^^-3/2 ) ^ log of these 

randomized algorithms are simple and practical, as demonstrated by some experimental results. 

In addition, we provide deterministic algorithms for computing e-quantizations of a specific class of functions. If 
A is a family of geometric shapes, such that (M'^,yi) has bounded VC-dimension, and / : M*^" — > M*^ is a function 
that describes some statistics on the smallest element from A that encloses the points (e.g. the radius of the smallest 
enclosing ball), then an e-quantization for / can be computed in deterministic time 0(poly(ra, ^)), as described in 
Table 1. 

This paper describes results for shape fitting problems for distributions of point sets in W^, in particular, we will 
use the smallest enclosing ball and the axis-aligned bounding box as running examples in the algorithm descriptions. 
We believe, though, that the concept of e-quantizations should extend to many other problems with uncertain data. 
In fact, variations of our randomized algorithm should work for a more general array of problems. 



2 Preliminaries: e-Samples and a-Kernels 

e-Samples. For a set P (in our context a point set), let yi be a set of subsets of P which for instance could be 
induced by containment in a shape from some family of geometric shapes. The pair (P, A) is called a range space. 
We say that Q is an e-sample of (P, A) if 



4>{Q) HP) 



where ] ■ | takes the absolute value and (/>(•) returns the measure of a point set. In the discrete case 4'{Q) returns the 
cardinality of Q. We say A shatters a set S if every subset of P is equal to Rn S for some R e A. The cardinality 
of the largest discrete set X C P that A can shatter is known as the VC-dimension of {P,A). 

When {P,A) has constant VC-dimension, we can create an e-sample Q of (P, yi), with probability 1 — (5, by 
uniformly sampling 0{v^ log j^) points from P [24]. There exist deterministic techniques to create e-samples [16, 
5] of size 0{v^ log ^) in time 0{v^^n{^ log '^Y)- When P is a point set in M*^ and the family of ranges is 
determined by inclusion of convex shapes whose sides have one of k predefined normal directions, such as the set of 
axis-aligned boxes, then an e-sample for (P, Q^) of size 0(| log^*^ ^) can be constructed in log^^ ^) time [21]. 

When we have a distribution ^, : ^ R"*", such that l^^-^ n{x) dx = 1, we can think of this as the set P 
of points in W^, where the weight w of a point p G M'^ is fi{p). To simplify notation, we write {fi,A) as a range 
space where the ground set is this set P = weighted by the distribution fi. Let it have VC-dimension u. For 
distribution /j, that is polygonally approximable [21] with a constant number of facets, we can construct an e-sample 
of size log |) in time 0{-^ log^ |)- ^ longer primer on e-samples is in Appendix A. 

a-Kernels. Given a point set P G of size n and a direction u G let P[u] = argmaxpgp(p, u), where 

(■ , ■) is the inner product operator. Let u>{P, u) = {P[u] — P[—u] , u) describe the width of P in direction u. We say 
that K C P is an a-kemel of P if for all u G S'^"^ 

Lo{P, u) — uj{K, u) < a ■ uj{P, u). 

a-kernels of size 0( ^(dii)/2 ) can be calculated in time 0{n + ^^.-3/2 ) [4]- Computing many extent related problems 
such as diameter and smallest enclosing ball on the a-kernel approximates the function on the original set [1]. 
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(a) (b) (c) (d) 

Figure 1 : (a) The true form of the function, (b) The e-quantization as a point set in M. (b) The inferred curve in ' 
(d) Overlay of the two images. 



3 Randomized Algorithm for ^-Quantizations 

We start with a general algorithm (Algorithm 3.1) which will be made specific in several places in the paper. We 
assume we can draw a point from fip for each p G P in constant time; if the time depends on some other parameters, 
the runtimes can be easily adjusted. 

Algorithm 3.1 Approximate hp with regard to a family of shapes § or function /§ 

1: for i = 1 to m = log ^) do 

2: for pj £ P do 

3: Generate qj G /ip. . 

4: SetFj = /s({gi,g2, •■■,?«})• 
5: Reduce or SimpUfy the set V = 



Defining e-quantizations. For an isotonic function /i : R ^ [0, 1], an e-quantization, R, is a set of points where 
for any t <E M., \h{t) — R{t)\ < e. We let R{t) = |^ YlreR — Since h has range [0, 1] and is isotonic, an 
£-quantization requires only 0{l/e) points. Figure 1 shows a illustration of how an £-quantization approximates a 
smooth function. Because h is isotonic there exists a function g -.W^ M+ such that h{t) = J^^_^ g{x) dx where 
/^_(^ g{x) dx = 1. Thus an e-sample of {g, J+) is an e-quantization of h, where J"*" is all 1-sided intervals. 

For an isotonic function /i : M*^ — > [0, 1] a /c-variate e-quantization, R, is a set of points in M'^ such that for any 
p G R'^, \h{p) — R{p)\ < e. For p G M'^ let R{p) = |^ l^geR ^{o. ^ P)- Because h is isotonic, there exists a 
function : M'^ — > M+ such that h{p) = J^^^ g{x) dx and j^^^d g{x) dx = 1. Thus an e-sample of {g, %+) is an 
e-quantization of h, where "Rj^ describes ranges Rp G 31+ defined by all q such that g ^ p for any p. See Figure 2 
for an illustration of A;-variate function h and a A;-variate e-quantization approximating it. 



(a) (b) (c) (d) 

Figure 2: (a) The true form of the multivariate function, (b) The e-quantisation as a point set in fc-space. (b) The 
inferred surface in A; -I- 1-space. (d) Overlay of the two images. 
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Algorithm for £-quantizations. For a function / on a point set P of size n, it takes Tf{n) time to evaluate f{P). 
We now construct /^^ by adapting Algorithm 3.1 as follows. First draw a sample point qj from each fip. for pj € P, 
then evaluate Vi = f{{qi, ■ ■ ■ , Qn})- The fraction of trials of this process that produces a value less than v is the 
estimate of /^^ (v). Finally reduce the size of V be returning | evenly spaced points according to the sorted order. 

Theorem 3.1. For a distribution fip of n points, there exists a univariate e-quantization of size 0{^) for f^^, and 
it can be constructed in 0{Tf{n) \ log ^) time, with success probability 1 — 5, where Tf{n) is the time it takes to 
compute f{Q)for any point set Q of size n. 

Proof Because /^^ : M ^ [0, 1] is an isotonic function, there exists another function g : M. ^ M"*" such that 
f^P (*) — Ix=-oo where g{x) dx = 1. And thus an e-sample of {g, 3+) is an e-quantization of /^^. 

By drawing a random sample Qi from each fip^ for pi G P, we are drawing a random point set Q from pp. Thus 
f{Q) is a random sample from g. Hence, using the standard randomized construction for e-samples, log ^) 
such samples will generate an | -sample for g, and hence an | -quantization for /^^, with probability 1 — 5. 

Since in an | -quantization, every value is off from the true function by at most |, then we can take an |- 
quantization of the step function and still have an e-quantization of the true function. Thus, we can reduce this 
to an e-quantization of size O(^) by taking a subset of | points spaced evenly according to their sorted order. □ 

We can construct /c-variate e-quantizations using the same basic procedure as in Algorithm 3.1. The output Vi of 
/§ is /c-variate and thus results in a /c-dimensional point. As a result, the reduction of the final size of the point set 
requires more advanced procedures. 

Theorem 3.2. For a distribution fip of n points, there exists a k-variate e-quantization of size 

fj^p, and it can be constructed in 0{Tf{n) ^ log ^ + ^^^ log^^ ^ log ^) time, with success probability 1 — 6, where 

Tf{n) is the time it takes to compute f{Q)for any point set Q of size n. 

Proof. In the fe-variate case there exists a function g : R'' ^ M+ such that fjj'piv) = J^^^g{x) dx where 
Jjgfc g{x) dx = 1. Then a random point set Q from /ip, evaluated as f{Q), is still a random sample from the 
A;-variate distribution described by g. Thus, with probability 1 — 5, a set of log ^) such samples is an e-sample 
of {g, 3?+), which has VC-dimension k, and the samples are also a fc-variate e-quantization of f^^. 

We can then reduce the size of the e-quantization to 0(| log^'^ ^) [21] (or to log ^) [5]), since the VC- 
dimension is k and each data point requires 0{k) storage. □ 

4 (£, Q;)-Kernels 

The above construction works for a fixed family of summarizing shapes. This section builds a single data structure, 
an (e, a)-kernel, for a distribution pp in M"* that can be used to construct (e, a) -quantizations for several families 
of summarizing shapes. In particular, an (e, Q!)-kemeI of pp is a data structure such that in any query direction 
u G we can create an (e, a) -quantization of a;(-, m), the width in direction u. This data structure introduces 
a parameter a, which deals with geometric error, in addition to the error parameter e, which deals with probabihty 
error. 

We follow the randomized framework described above as follows. Let % be an (e, Q;)-kemeI consisting of m = 

O(^log^) a-kernels, where each a-kernel Kj approximates a point set Qj drawn randomly from /ip. Given 
X, we can then create an (e, a) -quantization for the width of /Up in any direction u € S*^"^. Specifically, let 

M = {co{K„u)yjL,. 

Lemma 4.1. With probability 1 — 6, M is an (e, a) -quantization of the width of pLp in direction u. 

Proof. The width uj{Qj,u) of a random point set Qj drawn from ^p is a random sample from the distribution over 
widths of /ip in direction u. Thus, with probability 1 — 6, m such random samples would create an e-quantization. 
Using the width of the a-kemels Kj instead of Qj induces an error on each random sample of at most a ■ Lj{Qj,u). 
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Then for a query width w, say there are 7m point sets Qj that have width < w and 7'm a-kemels Kj with width 

< w. Note that 7' > 7. Let w = w — aw. For each point set Qj that has width > w but the corresponding a-kernel 
Kj has width < w, it follows that Kj has width > w. Thus the number of a-kemels Kj that have width < w is 

< 7m, and thus there is a width between w and such that the number of a-kemels < w' is exactly 7m. □ 

Theorem 4.1. With probability 1—5, we can construct an {e, a)-kernelfor jjlp on n points inM'' of size 0( ^(dii)/2 ^ log ^) 
an^/ m //me 0((n + 3^3^) ^ log 

/c-Dependent (e, a)-Kernels. The definition of (e, a) -quantizations can be extended to /c-variate (e, a) -quantizations 
M where (1) there exists a point x' G M*^ such that for all integers i G [1, fc] |a;(*) — < ax^^^ and (2) 

|M(x) — {x') I < £. Let x^*) represent the ith coordinate of a point x G M'^. 

(e, a)-kernels can be generalized to approximate other functions / : M*^" M^, specified as follows. We say a 
point p' G is a relative 9 -approximation of jo G R'^ if for each coordinate i we have p^''^ — p' 

W < Opii)_ For 

functions / and 9 where f{K) is a relative ^(a)-approximation of /(Q) when K is an a-kernel of Q, we say that / 
is relative 9{a)-approximable. 

By setting m = log ^) in the above algorithm, with probability 1 — (5, we can build a k-dependent (e, a)- 
ferne/ data structure IK with the following properties. It has size Q( ^(dii)/2 ^ log ^) and can be built in time 0( 
0,^-3/2 )^ log j^). To create a fc-variate (e, a) -quantization for a function /, create a /c-dimensional point pj = 
f{Kj) for each a-kernel Kj in X. The set M of m A:-dimensional points forms the /c-variate (e, a) -quantization. 

Theorem 4.2. Let f be a relative 9 {a) -approximable function that takes Tf{ri) time to evaluate on a set ofn points 
in W^. From a k-dependent {e, a) -kernel X with m a-kemels, with probability 1 — 5, we can create a k-variate 
(e, 9 (a)) -quantization of f, of size log^'^ ^) in time 0{Tf{ ^^^^-^y2 )m). 

Proof. Each evaluation of / on a point set Q j drawn from jip is a random sample from the distribution over / on 
point sets drawn from p p and hence these values on all m sampled point sets would be an e-quantization of /^^ . 

For a query point G M'^ , let 7m point sets produce a value Wj = f{Qj ) such that Wj ^ w, and let 7'm point 
sets produce a value Wj = f{Kj) such that Wj r< w. Note that 7' > 7. Because / is relative 0(a)-approximable, 
for each point set Qj such that Wj ^ w, but Wj w, then Wj w, where w = w — 9{a)w. (More specifically, for 
each coordinate w^'^^ of w, w^"^^ = w^"^^ — 6{a)w^''\) Thus, the number of point sets such that f{Kj) :< w is < 7m, 
and hence there is a point w' between w and w such that the fraction of sampled point sets such that f{Kj) ^ w' is 
exactly 7, and hence is within e of the tme fraction of point sets sampled from p,p with probability 1 — S. □ 

To name a new examples, the width and diameter are relative a-approximable functions, thus the results apply 
directly with k = 1. The radius of the minimum enclosing ball is relative 2a-approximable with k = I. The d 
directional widths of the minimum perimeter or minimum volume axis-aligned rectangle is relative a-approximable 
with k = d. 

4.1 Experiments with (e, a;)-Kernels and s-Quantizations 

We implemented these randomized algorithms for {e, a)-kemels and e-quantizations for diameter (diam), width in 

a fixed direction (dwid), and radius of the smallest enclosing £2 ball (seb2). We used existing code from Hai Yu [25] 
for a-kernels and Bemd Gartner [9] for seb2. For the input set /ip we generated 5000 points P C on the surface 
of a cyhnder piece with radius 1 and axis length 10. Each point p e P represented the center of a Gaussian with 
standard deviation 3. We set e = .2 and generated a-kemels of size at most 40 (the existing code did not allow the 
use to specify a parameter a, only the maximum size). We generated a total of m = 40 point sets from fip. The 
(e, a)-kernel has a total of 1338 points. We calculated ^-quantizations and (e, a) -quantizations for diam, dwid, and 
seb2, each of size 10; see Figure 3. 
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Figure 3: (e, a) -quantization (white circles) and e-quantization (black circles) for (a) seb2, (b) dwid, and (c) diam. 



5 Shape Inclusion Probabilities 

We can also use a variation of Algorithm 3.1 to construct e-shape inclusion probability functions. For a point set 
Q C M'^, let the summarizing shape Sq = S((5) be from some geometric family S so (M°', 8) has bounded VC- 
dimension u. We randomly sample point sets Qj from jip and then find the summarizing shape Sq. (e.g. minimum 
enclosing ball) of Qj. Let this set of shapes be S^'^p\ If there are multiple shapes from S which are equally optimal 
(as can happen degenerately with, for example, minimum width slabs), choose one of these shapes at random. For a 
set of shapes S' C §, let S'p C S' be the subset of shapes that contain p G W^. We store S^^^'' and evaluate a query 

point p G M*^ by counting what fraction of the shapes the point is contained in, specifically returning 1 5^^^ | / 1 5^^^ ^ | 
m 0{v\S^f'p^\)iimQ. In some cases, this evaluation can be sped up with point location data structures. 

Theorem 5.1. For a distribution fip of n points and a family of summarizing shapes {M.'^,§) with bounded VC- 
dimension v, with probability 1 — 5 we can construct an e-sip function of size 0(2^^+1^ log ^) and in time 
0{Tg{n) ^ log jg), where T§{n) is the time it takes to determine the summarizing shape of any point set Q CW^ of 
size n. 

Proof. If (M*^, S) has VC-dimension i/, then the dual range space (S, P*) has VC-dimension u' < 1^^^, where P* 
is all subsets Sp C §, for any p G W^, such that §p = {5" G § | p G S*}. Using the above algorithm, sample 
m = log ^) point sets Q from p,p and generate the m summarizing shapes Sq. Each shape is a random 
sample from S according to //p, and thus S'^^^^ is an e-sample of (8, P*). 

Let w^p (S), for 5 G §, be the probability that S is the summarizing shape of a point set Q drawn randomly from 
/xp. Let Wf^p{§') = f^gg/ Wf^p{S), where §' C P*, be the probability that some shape from the subset S' is the 
summarizing shape of Q drawn from /ip. 

We approximate the sip function at p G M*^ by returning the fraction \Sp^^^\/m. The true answer to the sip 
function at ^? G M'' is W^p(8p). Since S^^^) is an e-sample of (§, P*), then with probability 1 — 6 



4^^^ I W,p{§,) 



m 1 



sl'^^l W,p{§,) 



\S(''p)\ Wi,p{P* 



< e. 



Since for the family of summarizing shapes S the range space (M", S) has VC-dimension u, each can be stored 
using that much space. □ 

The size can then be reduced to 0(2^^+1^ log D in time 0((2'^+i)3-2"+'+i(| log l)2"'+'+i) using deterministic 
techniques. 



Representing £-sip functions by Isolines. Shape inclusion probability functions are density functions. One 
convenient way of visually representing a density function in is by drawing the isolines. A j-isoline is a closed 
curve such that on the inside the density function is > 7 and on the outside is < 7. 

In each part of Figure 4 a set of 5 circles correspond to points with a probability distribution. For part (a) and (c), 
the probability distribution is uniform over those circles, in part (b) and (d) it is drawn from a multivariate Gaussian 



(a) (b) (c) (d) 

Figure 4: (a) The shape inclusion probability for the smallest enclosing ball, for points uniformly distributed inside 
the circles, (b) The same, but for normally distributed points around the circle centers, with standard deviations 
given by the radii, (c) The shape inclusion probability for the smallest enclosing axis-aligned rectangle, for points 
uniformly distributed inside the circles, (d) The same, but for normally distributed points. 

distribution with standard deviation as the radius. We generate e-sip functions for smallest enclosing ball in Figure 
4(a,b) and for smallest axis-aligned bounding box in Figure 4(c,d). 

In all figures we draw approximations of {.9, .7, .5, .3, .l}-isohnes. These drawing are generated by randomly 
selecting m = 5000 (a,b) or m = 25000 (c,d) shapes, counting the number of inclusions at different points in the 
plane and interpolating to get the isolines. The innermost and darkest region has probability > 90%, the next one 
probability > 70%, etc., the outermost region has probability < 10%. 

When Hp describes the distribution for n points and n is large, then isolines are generally connected for convex 
summarizing shapes. In fact, in 0{n) time we can create a point which is contained in the convex hull of a point set 
sampled from with high probability. Specifics are discussed in Appendix B. 

6 Deterministic Constructions of ^-Quantizations 

In this section we consider functions / which describe the size of some summarizing shape from the family A 
such that (M'^,yi) has constant VC-dimension. In particular, given a point set Q C M*^, let A{Q) C M*^ (e.g. 
smallest enclosing ball) be the summarizing shape for Q, and let f{Q) be a statistic of A{Q) (e.g. radius of the 
smallest enclosing ball). The overall strategy will be to deterministically approximate each jip. with a point set Qp., 
although not with respect to the range space {|J,p^,A), but with a more complicated range space described below. 
Let Qp = {Qp^}i describe this set of point sets. Then let the function f{Qp,r) describe the fraction of point sets 
Q' = (gi G Qpi,q2 G Qp2,- ■■ ,Qn G Qpn) for {Qpi,- ■ ■ Qpn} = Qp such that f{Q') < r. We show that we 
can generate a set of point sets Qp such that /(Qp, r) is a good approximation of f^p{r). And we show how to 
efficiently evaluate f{Qp, r). 

6.1 Approximating Hp 

In this section we restrict that is either defined by a polygonal surface S with b facets or is polygonal approx- 
imable, it can be approximated by a finite polygonal surface S with b facets, for some constant b, as in [21]. 

It might seem that we can just create an e-sample of {fip.,A) for each /Xp., but we need to consider a more 
compUcated family >l/,n- Given a family of shapes A and a function / which computes a value determined by a 
summarizing shape A e A for a set of n points, then A f^n is a family of shapes where each is defined by a set of 
n — 1 points T C M*^ and a value w. Specifically, Af^n{T, w) is the set of points {p G R'^ | /(T U p) < w}. 

In certain cases, such as the volume of the axis-aligned bounding box, {jlp^,Af^n) has constant VC-dimension. 
Shapes from A are determined by the placement of 2d points, the most extreme in each axis direction, thus its 
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shatter dimension is ct/ = 2d. Hence an £-sample for {flp^,Af^n) of size log ^) can be calculated in time 
0(^log^ i) for each /Xp.. 

In other cases, such as the radius of smallest enclosing £2 disks, Aj^n defines regions which have 
0{n) {d — 1) -dimensional faces on its boundary and thus {iJ,p^,Af^n) has VC-dimension n. Naive 
techniques would take time exponential in n to deterministically create an e-sample, but we can do 
better by decomposing a shape A e Af^n into 0(n) disjoint simpler shapes. In the case of disks, 
Af^n has its boundary defined by at most 2n circular arcs of two different radii. We can choose 
a point in the convex hull of T and draw lines to each intersection of circular arcs, see the figure 
on right. The intersections of the disc defining each boundary piece and the halfspaces for the drawn Unes at its 
endpoints describes a wedge from a family W/^^. The range space {W^, 'V^f,n) has VC-dimension at most 9 because 
shapes from ^ are formed by the intersection of three shapes from families that would each have VC-dimenion 
3 in a range space on the same ground set. Thus, an ^-sample of (^p., W/„) is an e-sample of {^p-,Af^n)- So 
for radius of the smallest enclosing £2 balls we can create an £-sample of (/Hp. , A of size 0(n^ ^ log |) in time 
0(^2^ log2 |) for each /Xj,,. 

We generalize both of these cases to other shapes and in higher dimensions in Appendix C. There are also 
illustrations of various shapes from 

Lemma 6.1. When each ^Xp^ is approximated with an e' -sample Qp^ of (fip^ , A f^i), then for any r 

\Pr[f^AP) < r]-fi{Qp„Qp„...,Qp„},r)\ < e'n. 




Proof. When P is drawn from a distribution ij,p, then we can write that probability that f^p (P) < r as follows 

[ftMp{P) <r]= Mpi(gi) / Mp2(92) • • • / fJ'p„{qn)Hf{{qi,q2, qn}) < r) dqndqn-1 ...dqi 

Jo^ J ai J On. 



Pr 

Consider the inner most integral 



/ /^p„(gn)l(/({gi,92,---,gn}) < r) 



dq„ 



where {qi,q2. . . , qn-i} are fixed. The indicator function is true when for g„ f{{qi,q2, ■■ , qn-i,qn}) < r and 
hence qn is contained in a shape fTomAf^n{{qi,q2, ■ ■ ■ qn-i}, r). Thus if we have an e'-sample Qp„ for {lJ-p„-,'^f,n), 
then we can guarantee that 

/ l^Pu{qn)'^{f{{qi,q2,---,qn}) <r) dqn< v^—^ ^ l{f{{qi,q2,...,qn-i,qn})<r) + e'. 

Jq„ IVpnl Q 

We can then move the e' to the outside, and we can change the order of the integrals to write: 
Pr[^p(P)<r] < 

|7)^ XI / /^Pi(^i) / f^P2{q2)--- tJ'Pn-Aqn-i)Hf{{qi,q2,---,qn}) <r) dqn-idqn-2---dqi + e'. 
Repeating this procedure n times we get: 

/ n 1 \ " 

Pr[f,p{P)<r] < m ^ l(/({gi,g2,...,gn})<r) + e'n. 

\i=l l^PilJ i=i q^^Q^. 

= f{Qp,r) + s'n. 

Using the same technique we can achieve a symmetric lower bound for Pr[/^p (P) <r]. □ 
By setting e' = e/n we can achieve an additive e-approximation by using an e'-sample for each {ij,p.,Af^n)- 
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6.2 Evaluating /(Qp, r). 



Evaluating f{Qp,r) in time polynomial in n and \Qpi\, for any i, is not completely trivial since there are nl'^^i' 
possible sets in Qp. Let a good set be a set of n points, G, such that for each Qi there exists a point gi E G such that 
Qi ^ Qi- For each good set G there exists a unique basis of at most crj points^ which define the summarizing shape 
of G (remember the shatter dimension of A is and cTf < Uf, the VC -dimension). Define a valid basis to be a set 
of at most (7/ points in Qp such that each point is from a different Qi and if any point is removed the summarizing 
shape changes. Each valid basis forms a basis for several good sets. 

We now construct R, an £-quantization of /^^ . This approximation is created by calculating the summarizing 
shape for all good sets. Even though there are an exponential number of good sets, there are only a polynomial 
number of valid bases. Thus for each valid basis, we count the number of good sets it represents. And we let 
each valid basis contribute to the e-quantization; its position is determined by its value in / and its weight by 
the number of good sets it represents. We initially store the £-quantization as a sorted list of tuples (r, rj) where 
r = f{{qi jQ2, ■ ■ ■ , QrTf}) for some valid basis {gi, q2, ■ ■ ■ , Qaf}, and V is the fraction of the good sets which are 
represented by this valid basis. The details are outUned in Algorithm 6.1. 



Algorithm 6.1 Construct e-Quantization from Q p 

1: for all vaUd bases qi,q2, ■ ■ . ,qaf £ Qp do 

2: for i = 1 to n do 

3: if gi G Qi or q2 eQi or ... or q^f € Qi then 

4: Set«;i = ^. 

5: else 

6: Set Wi = ^q^eQi Hqj ^ M{qi,q2, qaf})) 

7: Insert (/(gi, 52, ■■■,qaf),U.i Wi) into R. 



We now summarize the full deterministic algorithm. For each {|J,p^,Af_n) we create an ^-sample Qp^ of size 
af{n,e). This makes the set Qp have 77 = Y17=i \Qp^\ ~ naf{n,e) points in its sets. We examine 0{ri"f) valid 
bases. For each vahd basis we can evaluate f{G) and compute Wi in RS/(n,£) time using a range searching data 
structure, after preprocessing or with a naive search. Thus the deterministic running time for constructing an e- 
quantization is 0{rffRSf{n,e)) which is presented for various summarizing shapes in Table 3. For instance, for 
volume of the axis-aligned bounding box this takes Oiji^'^ /e^'^ log'^^ ^) time and for radius of the smallest enclosing 
disks this takes 0(n^^'^/e^ log'^'^ |). The total construction time for the £-quantizations is the sum of this time and 
the time to construct n{e/n) -samples of (M°' , >l/,n ) ; for both smallest enclosing disks and for axis-aligned bounding 
boxes it is the former. 

A univariate e-quantization can be reduced to size O(^). Furthermore, we can create /c-variate e-quantizations 
using the same procedure (such as the width in the k dimensions of an axis-aligned bounding box). The condition in 
Lemma 6.1 where fi^p{P) < r can be replaced with a fc-variate condition f^p{P) ^ r for r G M^. Thus the same 
argument applies when we define / : M*^" M'^, and we can create A;-variate e-quantizations of size fe^^ log'^*^'^^ | 
in the same deterministic times as long as uf = 0{k). 

Theorem 6.1. For any range space {pLp,Af) for a distribution jip ofn points, with VC-dimension Vf, where each 

{l-ip- , A fji) has an y^-sample of size af{n, e), and where, after preprocessing m points and with near-linear space 
and time, we can count the number of points in a shape from Aj in RS{m,Af) time, we construct a k-variate 
e-quantization of f^^ of size A:^^ log*^*-'^'' j in 0{{naf{n,s))'^f ■ RS{{naf{n,e)Yf,Af)) time. 



^This uniqueness requires careful construction of the e-samples Qi, as described in Appendix A.l. 
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A Primer on ^-Samples 

We recall from Section 2 that for a range space (P, A) an e-sample Q C P guarantees 

(t>{RnQ) (piRnP) 



V 



ReA 



where | • | takes the absolute value and ^(•) returns the measure of a point set. In the discrete case ^(Q) returns the 
cardinality of Q. 

When P c M'' we describe a few common examples of A. Let B describe all subsets of P determined by 
containment in some ball. Let 3?^ describe all subsets of P defined by containment in some c?-dimensional axis- 
aligned box. Let J{ describe all subsets of P defined by containment in some halfspace. Throughout the paper we 
use A generically to represent one such family of ranges. 

Also recall from Section 2 that if (P, ^l) has bounded VC-dimension u, then we can create an e-sample, with 
probability 1 — (5, by sampling log ^) points at random, or deterministically of size log |) in time 
0{v'^'^n{^ log jY)- There exist e-samples of slightly smaller sizes [18], but efficient constructions are not known. 
If {P,A) has VC-dimension u, this also implies that {P,A) contains at most \PY' sets. 

Similarly, the shatter function t^[p,a){i^) of a range space (P, A) is the maximum number of sets S G {P,A) 
where l^l = m. The shatter dimension cr of a range space {P,A) is the minimum value such that 7r(p./t)(m) = 
0{m'^). It can be shown [12] that a < u and u = 0{(j log a). 

For a range space (P, A) the dual range space is defined {A, P*) where P* is all subsets Ap C A defined for an 
element p G P such that Ap = {A^A\p & A]. If (P, A) has VC-dimension v, then {A, P*) has VC-dimension 
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< 2"^+^. Thus, if the VC-dimension of {A, P*) is constant, then the VC-dimension of {P,A) is also constant [17]. 
Hence, the standard e-sample theorems apply to dual range spaces as well. 

Let g :R ^ M+ be a function where J^_^ g{x) dx = 1. We can create an e-sample Qg of {g, where J+ 
describes the set of all one-sided intervals of the form (— oo, t), so that 



max 

t 



f 

Jx- 



g{x) dx 



1 



qeQg 



< e. 



We can construct Qg of size O(^) by choosing a set of points in Qg so that the integral between two consecutive 
points is always £. But we do not need to be so precise. Consider the set of | points {^'^,52) • • • ) 92} such that 



Jx= 



ie/2. Any set of | points Qg = {(^i, g2, . . . , } such that q'l < qi < is an £-sample. 



A.l £-Samples of Distributions. 

We say a subset W C is polygonal approximable if there exists a polygonal shape S with m facets such that 
(l){W \ S) + (f){S \ W) < £0(VF) for any e > 0. Usually, m is dependent on e. In turn, such a polygonal shape 
S describes a continuous point set where {S, A) can be given an e-sample Q using log4) points if {S,A) has 
bounded VC-dimension [17] or using log^^ ^) points if A is defined by a constant k number of directions [21]. 
For instance, where A = "B is the set of all balls then the first case applies, and when A = JI2 is the set of all 
axis-aligned rectangles then either case applies. 

A shape W C R'^"'"^ may describe a distribution /i : R"^ ^ [0, 1]. We note that many common distributions like 
multivariate Gaussian distributions are polygonally approximable. For instance for a range space (/x, !B), then the 
range space of the associated shape is (W^, S x M) where !B x M describes balls in for the first d coordinates 
and any points in the (d -|- l)th coordinate. 

The general scheme to create an e-sample for {S,A), where 5 G R'^ is a polygonal shape, is to use a lattice A of 
points. A lattice A in R'^ is an infinite set of points defined such that for d vectors {vi, . . . , Vd} that form a basis, 
for any point p & A, p + Vi and p — Vi are also in A for any i G [l,d]. We first create a discrete | -sample M c A 
of {S,A) and then create an |-sample Q of {M,A) using standard techniques [5, 21]. Then Q is an e-sample of 
{S,A). For a shape S with m [d — l)-faces on its boundary, any subset A' C R*^ that is described by a subset from 
{S,A) is an intersection A' = Ar\ S ior some A ^ A. Since 5 has m {d — 1) -dimensional faces, we can bound 
the VC-dimension of (5, ^l) as = 0((m + vj\) log(m + where vji is the VC-dimension of (R*^, A). Finally 
the set M = 5 n A is determined by choosing an arbitrary initial origin point in A and then uniformly scaling all 
vectors {f 1, . . . , Vd} until \M\ = log |) [17]. This construction follows a less general but smaller construction 
inPhilhps [21]. 

It follows that we can create such an e-sample of size |M| in time 0(|M|mlog |M|) by starting with a scahng of 
the lattice so a constant number of points are in S and then doubUng the scale until we get to within a factor of d of 
|M|. If there are n points inside S, it takes 0{nm) time to count them. 

Lemma A.I. For a polygonal shape S C M'' with m facets, we can construct an e-sample for (S, A) of size 
log |) in time 0{m^ log^ |), where {S,A) has VC-dimension uji and v = 0{{vji + m) log(^'yi -|- m)). 

An important part of the above construction is the arbitrary choice of the origin points of the lattice A. This allows 
us to arbitrarily shift the lattice defining M and thus the set Q. In Section 6 we need to construct n e-samples 
{Qi, • • • , Qn} for n range spaces {{Si, A), . . . , {Sn,A)}. In Algorithm 6.1 we examine sets of vji points, each 
from separate e-samples that define a minimal shape A & A. It is important that we do not have two such (possibly 
not disjoint) sets of uji points that define the same minimal shape A e A. (Note, this does not include cases where 
say two points are antipodal on a disk and any other point in the disk added to a set of uji = 3 points forms such a 
set; it refers to cases where say four points lie (degenerately) on the boundary of a disc.) We can guarantee this by 
enforcing a property on all pairs of origin points p and q for {Si, A) and {Sj,A). For the purpose of construction, it 
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is easiest to consider only tlie /th coordinates pi and qi for any pair of origin points or lattice vectors (wliere the same 
lattice vectors are used for each lattice). We enforce a specific property on every such pair and qi, for all I and all 
distributions and lattice vectors. 

First, consider the case where A = Jld describes axis-aligned bounding boxes. It is easy to see that if for all pairs 
pi and qi that {pi — qi) is irrational, then we cannot have > 2d points on the boundary of an axis-aUgned bounding 
box, hence the desired property is satisfied. 

Now consider the more complicated case where .A = 23 describes smallest enclosing balls. There is a polynomial 
of degree 2 that describes the boundary of the ball, so we can enforce that for all pairs pi and qi that {pi — qi) is of 
the form ci(rp,)^/^ + C2(rg, )^/'^ where ci and C2 are rational coefficients and and r^, are distinct integers that are 
not multiple of cubes. Now if = d + 1 such points satisfy (and in fact define) the equation of the boundary of a 
ball, then no (d + 2)th point which has this property with respect to the first d + 1 can also satisfy this equation. 

More generally, if A can be described with a polynomial of degree p with u variables, then enforce that every 
pair of coordinates are the sum of {p + 1) -roots. This ensures that no u + 1 points can satisfy the equation, and the 
undesired situation cannot occur. 



B A Center Point for fip 

We can create a point q eW^ that is in the convex hull of a sampled point set Q from //p with high probability. This 

implies that for any summarizing shape that contains the convex hull, q is also contained in that summarizing shape. 
Let 3i be the family of subsets defined by halfspaces. We use the following algorithm: 

1 . Create 2-approximate center points pi for each //p. (i.e. using a (1/ 4)-sample of {|J,p^ , ?{)). Let the set be P. 

2. Create 2-approximate center point q of P. 

All steps can be done in 0{n) time because we can create (l/4)-samples of all range spaces (/ip^ , "K) and of (P, 'K) 
in 0(n) time. Constructing approximate center points can be done in 0(1) time on a constant sized set, such as 
(l/4)-sample [7]. 

Lemma B.l. Given a distribution of a point set /ip (such that each point distribution is polygonally approximable) 
of n points in W^, there is an 0{n) time algorithm to create a point q that will be in the convex hull of a point set 
drawn from hp with probability > 1 - ((1 - l/(2d + 2))i/(2d+2))n_ 

Proof. For each pi G P, any halfspace that has pi on its boundary and does not contain q has probability > 1/ {2d+2) 
of containing a random point from //p.. Thus for any direction u G S'^"^ there are at least n/ {2d + 2) points pi from 
P for which {q, u) < {pi,u). And for each of those points pi, the probability that the point qi sampled from /Ltp. 
is such that {pi,u) < {qi,u) (and thus {q,u) < {qi,u)) is < l/{2d + 2). Hence, the probability that there is a 
separating halfspace between q and the convex hull of Q (where the halfspace is orthogonal to some direction u) is 

< (1 - l/{2d + 2))"/(2'«+2) = ((1 - 1 /{2d + 2)) 1/(2^+2) )n_ □ 

Theorem B.l. For a set ofm < n point sets drawn i.i.d. from ^p, it follows that q is in each of the m convex hulls 
for each point sets with high probability (specifically with probability > 1 — m ((1 — l/(2(i + 2))^/^^'^^"^)) ). 

Proof. Let /? = (! — l/(2ci + 2))^/^^''''"^^ For any one point set the probability that q is contained in the convex 
hull is > 1 — By the union bound, the probabiUty that it is contained in all m convex hulls is > (1 — ^")"* = 

1 — + C^) /3^" — C^) -I- Since n > m, the sum of all terms after the first two in the expansion increase 

the probability. □ 

Thus because the summarizing shapes are convex, then for any point q, the line segment q^ is completely contained 
in a convex summarizing shape if and only if q is. Thus for every boundary of a summarizing shape qq crosses, q is 
outside that summarizing shape. This implies the following corollary. 

Corollary B.l. If the summarizing shape is convex, then the 'y-layer,for 7 < 1 — 1/m, exists, is connected, and is 
star-shaped with high probability, specifically with probability > 1 — m((l — l/(2d + 2))^/(^''+^)) . 
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C Shapes of Af^n for Various Summarizing Shapes 



Let A be the intersection of 0(n) shapes from A f where (/Up, A f) has VC-dimension Uf. Let a wedge of A be 
a shape from Wj ,i described by the intersection of d hyperplanes and one shape from Af. 

Lemma C.l. The VC-dimension o/(R°', Wf n) is d{d + 1) + ;>/. 

Proof. It is known that a class of shapes that is formed as the intersection of k subset from {R'^,Aj) for j G 
[1 : k] which have VC-dimension Uj, then {W^, W^) has VC-dimension Yl^=i [12]. Since wedges are formed 
by the intersection of d halfspaces (VC-dimension d+l) and one shape from Af, it follows that (M'^,W/,j) has 
VC-dimension d{d-\-l) -\-Uf. □ 

If the (d — 2)-dimensional faces of the boundary of Af^n are subsets of {d — 2) -dimensional flats (i.e. points in 
and line segments in R^), then any shape from Af^n can be formed as the disjoint union of 0{n) wedges from 
f^n- Functions which produce such families Af^n include seb2 and chp in and diam and cha in W^. 

Remark 1. In cases, such as seb2 and chp for d > 2, where we cannot form wedges, we can create similar shapes 
for W/^„, as generalized cones whose boundary passes through the boundary of each (d — l)-dimensional facet of the 
corresponding shape fromyij^„. For these shapes the VC-dimension of {W^, W/,n) can be bounded as 0{ufd\ogd). 
Each face of a generalized cone is described by two shapes from A, which have VC-dimension Vf, and a point. Thus 
the face of the generalized cone has shatter dimension 0{vf). If a (d — 1) -dimensional facet of the boundary of a 
shape from Af^n has more than d faces of dimension {d — 2), then we can triangulate the facet so it has 0{d) such 
faces. Thus the range space for the generaUzed conchas shatter dimension 0{i'fd) and VC-dimension 0{i'fdlogd). 

The VC-dimension for (M'', Wj „), ^f, is shown for several functions in Table 2. 

Lemma C.2. If the disjoint union ofm shapes from can form any shape from Af^n' then an —sample of 
{fip, W/^n) is an e-sample of{fXp,Af^n)- 

Proof. For any shape A e Af^n we can create a set of m shapes {Wi, . . . , Wn} C W/^^ whose disjoint union is A. 
Since each range of W/^^ may have error ^, their union has error at most e. □ 

Hence for (/Xp, W/,„) ^-samples can be created of size 0(n2^ log f ) in time 0(n(f log^/ |). 



Table 1: Runtimes fore-Quantizations of Various Summarizing Shape Families. 



abbrv. 


summarizing shape 


randomized* 


determ. 


determ. K'' 


dwid 


width along a fixed direction 


0(n;^logi) 


O(nVe) 


0{n^/e) 


aabbp 


axis-aligned bounding box measured by perimeter 


O(n^logi) 






aabba 


axis-aligned bounding box measured by area 


0{n^ log I) 






seboo 


smallest enclosing ball. Loo metric 


O(nilogi) 






sebi 


smallest enclosing ball, Li metric 


O(nj^logi) 






seb2 


smallest enclosing ball, L2 metric 


O(n^logi) 






diam 


diameter 


0(n2^ log i) 


0((nV£')"+') 




cha 


convex hull measured by area 


0(n log log i) 


0((nV£')"+') 


0((n''+Ve')"+') 


chp 


convex hull measured by perimeter 


O(nlogn^logi) 


0((nVe2)n+i-, 


0((n<*+V£^)"+') 



* all randomized results are correct with constant probability. 
0{f{n, e)) ignores poly-logarithmic factors (log 5^^o(poiy((i)) ^ j^j. ^j^y r > 0. 



C.l Examples 

In the examples below, 7 points are given, on which we study a certain measure (e.g., diameter or convex hull area). 
The grey region denotes the possible placements of a new point, such that the measure will not exceed a given value. 
These regions illustrate Af^n for various summarizing shapes. 
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Table 2: VC-dimension for Various Shape Families. 



abbrv. 










aabbp 


0(1) {a - 4) 


0{d\ogd) {a = 2d) 






aabba 


8 


0{d\ogd) (cr = 2d) 






seboo 


4 


2d 






sebi 


4 


2d 






seb2 


00 


00 


9 


0(^2 log d) 


diam 


oo 


oo 


9 


d^ + 2d+l 


cha 


oo 


oo 


7 


d^ + 2d+l 


chp 


oo 


oo 


0(1) 


0{d^logd) 



Axis-aligned bounding box. Figure 5 shows examples of Af^n for axis-aligned bounding boxes, measuring 
either by perimeter (aabbp) or by area (aabba) in M^. For both {M.'^,Af^n) has a shatter dimension of 4 because the 
shape is determined by the x-coordinates of 2 points and the y-coordinates of 2 points. This generahzes to a shatter 

dimension of 2d for (R'^,yij „), where area generalizes to (i-dimensional volume, and perimeter generalizes to the 
{d — l)-volume of the boundary. We can also show the VC-dimension of (M^, yi/^„) is 8 for aabbp because its shape 
is defined by the intersection of halfspaces with 4 predefined normal directions at 0°, 45°, 90°, and 135°. This can 
be generalized to higher dimensions. 

2 

Hence, for both shapes we can create n ^-samples of {|J,p^,Af^n) of size a/(n, = 0(p-log in time 
0(^ log^ |). For aabbp in M^, an |-sample of each {fj,p.,Af^n) of can be reduced further to size log^^ |) 
in total time 0(^ log^" Then we can construct the e-quantization in (n^'^/e'*'^)(log ^)O(d) ^jjQe, using orthogo- 
nal range searching. For aabbp in M^, the runtime improves to 0(n^/e^ log^^ ^). 

Smallest enclosing ball. Figure 6 shows examples of Af^n for smallest enclosing ball, for metrics L^o (seboo), 
Li (sebi), and L2 (seb2) in M^. For seboo and sebi, (K^, A has VC-dimension 2d because the shapes are defined 
by the intersection of halfspaces from d predefined normal directions. For sebi and seboo, we can create n ^-samples 
of each {fip.,Af^n) of size af{n,e) = log j) in total time log^ |). The size for each can be reduced to 
0(| log^*^ f ) in 0(^ log*^^ |) total time. Using an orthogonal range searching data structure we can calculate the 
£-quantization in 0{n^'^+^ /e'^+^ log^'^'^ f ) time. 
For seb2, {W^,Af^n) has infinite VC-dimension, but {M.^,Wf^n) has VC-dimension < 9 because it is the inter- 




(a) (b) 

Figure 5: (a) Axis-aligned bounding box, measured by perimeter, (b) Axis-aligned bounding box, measured by area. 
The curves are hyperbola parts. 
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(a) (b) (c) 

Figure 6: (a) Smallest enclosing ball, L^o metric, (b) Smallest enclosing ball, Li metric, (c) Smallest enclosing ball, 
L2 metric. The curves are circular arcs of two different radii. 

section of 2 halfspaces and one disc. Any shape from Af^n can be formed from the disjoint union of 2n wedges. 

Choosing a point in the convex hull of the n — 1 points describing A /,„ as the vertex of the wedges will ensure 
that each wedge is completely inside the ball that defines part of its boundary. Thus, in the n ^-samples of each 
{lJ,p.,Af^n) are of size a/(n, e) = 0(n^/e^ log |) and can all be calculated in 0{n^/e^ log^ |) time. And then the 
e-quantization can be calculated in 0(n^^-^/£^ log^'^ |) time, using range searching data structures. 

For seb2, in for d > 3, we can form shapes from yij „ with disjoint unions of generalized cone from a family 
W/^„, where (M'^,'Wj^„) has shatter dimension 0{(P). We need 0(nL'^/^J) such shapes from W/^„ to form one 
shape A G Af^n, because A has boundary described by 0(nL'^/^J ) sphere piece with one of d different radii. The 
VC-dimension of each {|J,p^,Wf^n) is 0{(Plogd) in M'', and we can create n ^-quantization of each {|J,p^,Af^n) 
of size a/(n, e) = 0(n2+2L'i/2J /^s Q^^3+2ld/2\ |^ (^^^^1 ^j^g_ r^^^ e-quantization can be 

computed in 0{{n'^^'^ j e^Y^'^'^l'^) time ( ignoring boundary cases with floor operations) using a range searching 
data structure. 

Diameter. Figure 7 shows an example of for the diameter of a point set in M^. Here {W^^Af^r!) has infinite 
VC-dimension. It is formed by the intersection of balls of the same radius centered at the points. Thus a shape from 
yij.„ is determined by at most n balls, and since they are each the same radius, we can construct a shape from Af^n 
from the disjoint union of n wedges, as with seb2. And since each wedge is the intersection of d halfspaces and 
1 disc, (M'^,'Wj ,i) has VC-dimension [d -I- 1)^. Thus we can construct n ^-samples for each (//p. , of size 
a/(n, e) = 0{n^ /e^ log |) in total time 0(rc' je^ log^ |). However, given a set of n points, the shape which defines 
the set of points where the diameter will not increase has complexity 0(n). This is the family the union of 

n balls. This implies the size of the basis a/ used in Algorithm 6.1 is n. Hence, it takes time 0((n^/e^ log 
to construct the e-quantization. 

Convex hull. Figure 8 shows examples of „ for the convex hull, measured either by area (cha) or perimeter 

(chp) in R^. For both, (R^,yij^„) has infinite VC-dimension. For cha (R^,W^,„) has VC-dimension 7, because 
wedges are triangles. In higher dimensions cha can continue to use wedges, but needs 0(raL'^/2j ^ of them. For chp, 
the wedges boundary is described by d hyperplanes and an ellipse boundary part. We cannot guarantee that the 
intersection of all of these parts describes the wedge because the ellipse may be too small and may cut off part of the 
intersection of halfspaces. But in R^ the wedge clearly does have shatter dimension 4 + 5, so the VC-dimension of 
(R^, Wj „) is 0(1). In higher dimensions we can use generahzed cone shapes with VC-dimension 0{(^ log d) and 
we may need 0(nL'^/^J) of them. 
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Figure 7: Diameter. The curves are circular arcs all of the same radius. 



For both cha and ch p we can calculate n ^-samples for each (//p. , A f^n) of size a/ (n, e) = 0{n?'^'^ l-<i/2j / log " ) 
in ©(n-'+^L'^/^-l /e^ log^ -) total time. However, given a set of n points, the shape which defines the set of points 
where the convex hull will not increase has complexity 0(n). This is the family Af-n+i- Like diam, this implies 
the size of the basis af used in Algorithm 6.1 is n. Hence, it takes time 0((n°'+^/e^ log to construct the 

e-quantizations. 



Table 3: e-Samples for Summarizing Shape Family 



abbrv. 


Q!/(n, e) 


77 = naf(n, e) 




7]"} 


RS/(n,e) 


runtime 


aabbp 




0(nVe2) 


2d 


0(n6'^/£4'i) 






aabba 




0(nVe2) 


2d 


0(n6'i/£4<i-) 


0(1) 




seboo 




0(nV£) 


d+l 


C)(„2d+2/gC(+l) 


0(1) 


0(„2d+2/^d+l) 


sebi 




0(nV£) 


d+l 




0(1) 


0(„2d+2/gd+l) 


seba 






d+l 


(5((^d+3/£2)<i+l) 


0((„d+3/£2)l-l/d) 




diam 






n 


0(nVe2)« 




0((nV£')"+') 


cha 




0(n'^+3/£') 


n 


0((n'*+3/£')") 




d((n^+Ve2)«+i) 


chp 






n 









0(/(n, £)) ignores poly-logarithmic factors (log 5:)0(poiy(d)) 




(a) (b) 
Figure 8: (a) Convex hull, measured by perimeter. The curves are ellipse parts, (b) Convex hull, measured by area. 
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